Faster Text Fingerprinting

نویسندگان

  • Roman Kolpakov
  • Mathieu Raffinot
چکیده

Let s = s1..sn be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set F of all fingerprints of all its substrings. A fingerprint, f ∈ F , admits a number of maximal locations 〈i, j〉 in S, that is the alphabet of si..sj is f and si−1, sj+1, if defined, are not in f . The set of maximal locations is L, |L| ≤ n|Σ|. Two maximal locations 〈i, j〉 and 〈k, l〉 such that si..sj = sk..sl are named copies and the quotient of L according to the copy relation is named LC . The faster algorithm to compute all fingerprints in s runs in O(n+ |L| log |Σ|) time. We present a quite always faster O((n+ |LC |) log |Σ|) algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Faster Query Algorithm for the Text Fingerprinting Problem

Article history: Received 15 February 2009 Revised 14 January 2011 Available online 7 April 2011

متن کامل

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

Information Hiding for Text by Paraphrasing

Digital fingerprinting becomes paid growing attention as a technology resolving copyright problems. Previously, researchers have been only interested in image based digital fingerprinting where secret information is hidden in images, and text have not been the main target of hiding information. In this paper, we propose an information hiding method for text. Our information hiding method is bas...

متن کامل

Digital Fingerprinting Based on Keystroke Dynamics

Digital fingerprinting is an important but still challenging aspect of network forensics. This paper introduces an effective way to identify an attacker based on a strong behavioral biometric. We introduce a new passive digital fingerprinting technique based on keystroke dynamics biometrics. The technique is based on free text detection and analysis of keystroke dynamics. It allows building a b...

متن کامل

Fingerprinting of Digital Information—Introduction and some Preliminary Results

Coding methods for fingerprinting digital information are considered, with the aim of deterring users from copyright violation. A general model for discrete fingerprinting is presented, along with a simple embedding method for text documents. Different types of attacks are discussed, including attacks from colluding pirates. Some preliminary results are derived for random fingerprinting codes. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008